Keyword spotting for highly inflectional languages
نویسندگان
چکیده
This paper presents our new keyword spotting system taking advantage of both the filler model and the confidence measure approaches. The novelty is in a non-standard connection of the filler and the keyword models together with introduction of a new confidence measure based on a keyword normalized score. In detail the paper deals with a decision block. Two methods are introduced. The first is based on comparison a keyword normalized score with a predefined decision threshold. The second uses three-layer feed-forward neural network for decision if the keyword was or was not spoken. Results from the both presented methods are compared with the large vocabulary continuous speech recognition system used for keyword spotting. Obviously, LVCSR using a proper language model can give better results. Besides, it has higher CPU and memory demands. Furthermore, in many common situations the spontaneous language is mostly unconstrained and includes OOV words (keywords) such as names of peoples, companies, places, products etc., so the availability of an appropriate language model is very problematic.
منابع مشابه
Multi-grained alignment of parallel texts with endogenous resources
This paper deals with the spotting of multigrained translation equivalents from parallel corpora. The idea is to contribute to the processing of languages for which few linguistic resources are available. We especially pay attention to the handling of highly inflectional languages. Our approach is endogenous: it does not require external linguistic resources such as stemmers or taggers.
متن کاملUsing phonological phrase segmentation to improve automatic keyword spotting for the highly agglutinating Hungarian language
This paper investigates the usage of prosody for the improvement of keyword spotting, focusing on the highly agglutinating Hungarian language, where keyword spotting cannot be effectively performed using LVCSR, as such systems are either unavailable or hard to operate due to high OOV rates and poor Ngram language modelling capabilities. Therefore, the applied keyword spotting system is based on...
متن کاملDocument Image Retrieval Based on Keyword Spotting Using Relevance Feedback
Keyword Spotting is a well-known method in document image retrieval. In this method, Search in document images is based on query word image. In this Paper, an approach for document image retrieval based on keyword spotting has been proposed. In proposed method, a framework using relevance feedback is presented. Relevance feedback, an interactive and efficient method is used in this paper to imp...
متن کاملUnsupervised Spoken Keyword Spotting and Learning of Acoustically Meaningful Units
The problem of keyword spotting in audio data has been explored for many years. Typically researchers use supervised methods to train statistical models to detect keyword instances. However, such supervised methods require large quantities of annotated data that is unlikely to be available for the majority of languages in the world. This thesis addresses this lack-of-annotation problem and pres...
متن کاملComparing decoding strategies for subword-based keyword spotting in low-resourced languages
For languages with limited training resources, out-ofvocabulary (OOV) words are a significant problem, both for transcription and keyword spotting. This paper investigates the use of subword lexical units for keyword spotting. Three strategies for using the sub-word units are explored: 1) converting word-based lattices to subword lattices after decoding, 2) performing a separate decoding for ea...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004